Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
نویسندگان
چکیده
It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network’s input-output Jacobian isO(1) is essential for avoiding the exponential vanishing or explosion of gradients. The stronger condition that all singular values of the Jacobian concentrate near 1 is a property known as dynamical isometry. For deep linear networks, dynamical isometry can be achieved through orthogonal weight initialization and has been shown to dramatically speed up learning; however, it has remained unclear how to extend these results to the nonlinear setting. We address this question by employing powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network’s input-output Jacobian. We explore the dependence of the singular value distribution on the depth of the network, the weight initialization, and the choice of nonlinearity. Intriguingly, we find that ReLU networks are incapable of dynamical isometry. On the other hand, sigmoidal networks can achieve isometry, but only with orthogonal weight initialization. Moreover, we demonstrate empirically that deep nonlinear networks achieving dynamical isometry learn orders of magnitude faster than networks that do not. Indeed, we show that properly-initialized deep sigmoidal networks consistently outperform deep ReLU networks. Overall, our analysis reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning.
منابع مشابه
Active Learning: An Approach for Reducing Theory-Practice Gap in Clinical Education
Introduction: The gap between theory and practice in clinical fields, including nursing, is one of the main problems that many solutions have been suggested to eliminate it. In this article, we have tried to investigate its solution through active learning. Methods: In this review article, searching articles published during 2000-2012 was done through library references, scientific databases. ...
متن کاملمقایسه تاثیر روشهای تدریس تئوری-عملی با عملی- تئوری درس آناتومی بر میزان یادگیری و رضایتمندی دانشجویان
Abstract Background: Educational systems need to modify teaching methods in order to be effective. This research was conducted to study the effects of theory-practice and practice-theory methods of anatomy teaching on student learning and satisfaction. Methods: This quasi-experimental survey was carried out on second semester students of Lorestan University. During a 6-week period student...
متن کاملبررسی دو نوع تغییر پذیری هماهنگی و مطلق حرکتی مفاصل اندام فوقانی طی فرایند یادگیری
Objective: Theories are in paradox in relation to the variability pattern with being skilled. Then this research has investigated two types of absolute and coordination variability during basketball free throw learning, with isolating mechanical and dynamical degrees of freedom. Methods: Twenty self-declared participants were randomly assigned into two groups: physical practice and control. Ex...
متن کاملLearning Style in Theoretical Courses: Nursing Students’ Perceptions and Experiences
Introduction: Learning style as a whole is less regarded in nursing education. This study was conducted to explore, describe, and illustrate students' perceptions and experiences of learning style. The multiplicity feature of students' learning style in theoretical courses is presented in this article. Methods: In this qualitative study, 16 bachelor and master students in different academic se...
متن کاملA Dynamical System Approach to Research in Second Language Acquisition
Epistemologically speaking, second language acquisition research (SLAR) might be reconsidered from a complex dynamical system view with interconnected aspects in the ecosystem of language acquisition. The present paper attempts to introduce the tenets of complex system theory and its application in SLAR. It has been suggested that the present dominant traditions in language acquisition research...
متن کامل